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1.  Introduction.  In  this  paper,  we  give  a  progress  report  on  the  development  of  a 
new  family  of  domain  decomposition  methods  for  the  solution  of  Helmholtz’s  equation. 
We  present  three  algorithms  based  on  overlapping  Schwarz  methods;  in  our  favorite 
method  we  proceed  to  the  continuous  finite  element  approximation  of  the  Helmholtz’s 
equation  through  a  sequence  of  discontinuous  iterates.  While  this  is,  quite  possibly,  a 
new  type  of  overlapping  Schwarz  methods,  we  have  been  inspired  to  develop  this  idea 
by  the  thesis  of  Despres  [4] . 

The  basic  domain  decomposition  algorithm  considered  by  Despres  is  defined  as 
follows:  The  given  region  Q  is  divided  into  two  nonoverlapping  subregions  fli  and 
and  the  iteration  is  advanced  by  simultaneously  solving 

-Au]+1  -  k2u]+1  =  /  iesi3, 

(1)  du™+1/dnint  -  ik-Uj+1  =  ~dunonJ dnout  -  ikunout  iG, 

du™+1  / driint  ~  ikunj+1  =  g  x  €  <9fi, 

in  the  two  subregions.  Here  /  and  g  are  data  given  for  the  original  problem,  k  is  a  real 
parameter,  and  T  the  interface,  i.e.  the  parts  common  to  and  <9^2,  the  boundaries 
of  subregions.  We  note  that  Sommerfeld-type  boundary  conditions  are  used  and  that 
the  subregions  themselves  can  be  the  union  of  a  number  of  disjoint  regions  as  in  the  case 
when  is  cut  into  strips  and  the  strips  colored  using  two  colors.  The  iterates  generally 
have  jumps  across  the  interface;  the  jump  will  go  to  zero  as  the  iteration  converges. 

In  his  thesis,  Despres  proves  convergence  in  a  relatively  weak  sense  for  a  quite 
general  decomposition  into  nonoverlapping  subregions  and  also  conducts  a  detailed  the¬ 
oretical  and  numerical  study  for  a  rectangular  Q  cut  into  two.  The  convergence  is  slow, 
but  it  is  shown  that  under- relaxation  can  lead  to  an  improvement.  Despres  also  briefly 
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considers  the  use  of  overlap;  it  is  shown  that  this  leads  to  a  considerable  improvement 
in  the  rate  of  convergence  of  the  iteration  for  the  two  subregion  case. 

In  the  limit  of  increasing  domain  diameter,  the  Sommerfeld  boundary  condition 
provides  the  correct  far-field  condition  for  propagation  of  waves  in  the  frequency  domain, 
but  for  a  bounded  region  it  does  not  provide  perfect  transparency.  It  can  be  argued 
that  an  alternative  boundary  condition,  which  more  closely  approximates  the  correct 
nonlocal  non-reflecting  boundary  condition,  would  lead  to  more  rapid  convergence. 
These  ideas  have  indeed  been  tested  with  some  success  by  Ghanemi  [5]  and  others. 
This  essentially  amounts  to  replacing  the  ik  terms  in  the  interface  condition  (1)  with 
ikT,  where  T  is  an  appropriate  nonlocal  operator.  See  also  [9]  for  work  more  closely 
related  to  ours. 

In  our  own  work,  we  have  instead  attempted  to  use  three  ideas  that  have  proven 
successful  in  studies  of  other  types  of  problems:  We  have  focused  almost  exclusively 
on  methods  based  on  overlapping  decompositions  of  the  region  Q.  In  addition,  we  are 
exploring  the  possible  benefits  of  a  coarse  solver  as  a  part  of  our  preconditioner;  we  note 
that  the  use  of  a  coarse  space  correction  is  required  to  establish  convergence  of  domain 
decomposition  algorithms  for  a  class  of  nonsymmetric  and  indefinite  elliptic  problems 
previously  considered  by  Cai  and  Widlund  [2,  3].  We  also  take  advantage  of  well-known 
accelerators  of  the  basic  iteration  schemes,  in  particular  the  GMRES  algorithm. 

We  refer  to  Smith,  Bjprstad,  and  Gropp  [11]  for  an  introduction  to  domain  decom¬ 
position  methods  in  general,  and  these  ideas  in  particular.  We  note  that  there  are  a 
number  of  variants  of  the  Schwarz  algorithms:  additive,  hybrid,  restricted,  etc.  In  our 
work,  we  are  now  focusing  on  the  classical,  multiplicative  algorithm. 

2.  Differential  and  Discrete  Model  Problems.  We  consider  a  Helmholtz 
model  problem  given  by 

(2)  —  A u  —  k2u  =  /  x  6  £2,  du/dn  —  iku  =  g  x  G  dil. 


where  Q  is  a  bounded  two  or  three-dimensional  region.  This  equation  is  uniquely 
solvable,  and  we  note  that  the  boundary  condition,  said  to  be  of  Sommerfeld  type,  is 
essential  in  the  proof  of  this  fact. 

We  use  Green’s  formula,  and  complex  conjugation  of  the  test  functions,  to  convert 
(2)  into  variational  form:  Find  u  G  iF1(fI)  such  that, 


b(u ,  r) 


uvds 


f  fvdx  +  f  ; 

Ja  JdQ 


gvds 


F{v)  VuGiF1^). 


Finite  element  problems  can  now  be  defined  straightforwardly  by  replacing  FT1 (£2)  by 
a  suitable  conforming  finite  element  space.  So  far,  we  have  worked  mainly  with  lower 
order  elements  but  have  made  progress  towards  extending  our  studies  and  numerical 
experiments  to  spectral  elements. 

Our  interest  in  the  spectral  element  case  has  been  inspired  by  the  work  of  Ihlenburg 
and  Babuska  [7,  8,  6]  and  the  thesis  by  Melenk  [10].  They  have  considered  the  well- 
posedness  of  the  original  problem  and  different  finite  element  discretizations  and  proven, 
for  a  model  problem  in  one  dimension,  that  the  basic  estimate 


u\h1  <  Ck\F\H-i 
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holds.  In  the  finite  element  case,  an  assumption  of  hk  <  1  is  used.  The  constant  C 
is  independent  of  p.  the  degree  of  the  finite  elements.  Ihlenburg  has  also  conducted 
extensive  numerical  experiments  which  suggest  that  this  bound  also  holds  for  problems 
in  two  or  three  dimensions.  Error  bounds  of  the  following  form  are  also  given  for  p  =  1 
and  kh,  small  enough: 

\error\jj\  <  C\9  +  C2k82  where  8  =  best  if1— error. 

With  oscillatory  solutions  typical,  we  can  expect  8  to  be  on  the  order  of  kh.  In  that 
case,  the  second  term,  which  is  due  to  the  phase  error,  will  dominate  unless  k2h  is  on 
the  order  of  1.  Larger  values  of  p  appear  attractive  since  Ihlenburg  and  Babuska  have 
also  shown  that 


\error\m  <  <F(Ci  +  C282)  +  C3kd2p . 

Here  9  =  h,k/2p ,  and  the  phase  error  is  now  relatively  less  important. 

3.  Overlapping  Schwarz  Algorithms.  The  basic  multiplicative,  one-level  over¬ 
lapping  Schwarz  method  can  be  described  as  follows:  Let  {0,}  be  a  set  of  open  sub- 
regions  that  covers  the  given  region  Q.  Just  as  in  the  strip  case  of  Section  1,  each 
subregion  Qj  can  have  many  disconnected  components;  it  is  often  profitable  to  color  the 
subregions  of  an  original  overlapping  decomposition  of  Q  using  different  colors  for  any 
pair  of  subregions  that  intersect.  The  original  set  of  subregions  can  then  be  partitioned 
into  sets  of  subregions,  one  for  each  color,  effectively  reducing  the  number  of  subregions. 
This  decreases  the  number  of  fractional  steps  of  our  Schwarz  methods  and  helps  make 
the  algorithms  parallel.  The  number  of  colors  is  denoted  by  .J . 

In  many  cases,  it  is  appropriate  to  view  a  multiplicative  Schwarz  method  as  follows: 
A  full  iteration  step  proceeds  through  J  fractional  steps, 

un+j/J  -  «*+(*'—: 1)/J  =  Pj{u-  un+{j~1)/J), 

where  Pj,j  =  1 is  a  projection  onto  a  subspace  Vj  related  to  Qj  and  u  is  the 
exact  finite  element  solution.  Such  a  fractional  step  can  be  more  easily  understood  by 
rewriting  it  in  the  form 

(3)  -  u"^-1’/-7,  v)  =  F{v)  -  b(un+j-^lJ,  v )  Vu  6  Vj. 

The  choice  of  the  local  sesquilinear  form  bj  ( • ,  •)  and  the  space  Vj  determines  the  projec¬ 
tion  Pj.  We  will  examine  several  choices  one  of  which  has  discontinuous  iterates,  and 
for  it  we  will  need  an  alternative  to  formula  (3) .  Introducing  a  splitting  of  the  Helmholtz 
form  with  respect  to  each  Qj  and  its  complement 

(4)  b(u,v)  =  bJ(u,v)  +  bcJ(u,v), 
which  we  will  further  describe  below,  we  can  replace  (3)  by 

(5)  bj{un+j/J,  v )  -  F{v)  -  bcJ{un+^-V>lJ, .36). 

It  is  also  easy  to  introduce  a  coarse  space  correction  and  a  second  level  into  the 
algorithm.  An  additional  fractional  step  is  then  used;  we  choose  to  make  this  correction 
prior  to  the  other,  local  steps.  In  our  experiments,  we  have  so  far  only  used  the  same 
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low  order  finite  element  method  on  a  coarser  mesh.  The  space  related  to  this  mesh  and 
fractional  step  is  denoted  by  Vo,  and  we  use  formula  (3)  to  define  the  related,  special 
update.  We  note  that  all  the  solvers  used  in  the  fractional  steps  are  smaller,  often  much 
smaller,  instances  of  the  original  problem. 

One  difficulty  with  faithfully  implementing  a  generalization  of  Despres’  algorithm 
with  overlap  is  the  appearance  and  disappearance  of  multiple  values,  i.e.  jumps,  across 
different  parts  of  the  interface  T,  which  is  now  defined  by 


U dQj  \  dQ. 

In  our  first  two  algorithms,  we  avoid  jumps  and  use  traditional  domain  decomposition 
techniques,  but  in  the  third  and  most  successful  algorithm  jumps  in  the  solution  are 
fully  accommodated. 

The  three  algorithms  can  now  be  defined  in  terms  of  the  sesquilinear  forms  bj(-,  •) 
and  the  subspaces  Vj. 

ALG1  An  update  with  zero  Dirichlet  condition  on  dQj  \  Q  is  used  in  th ejth  fractional 
step;  this  preserves  the  continuity  of  the  iterates.  The  test  functions  of  Vj  then 
vanish  at  all  mesh  points  in  the  closure  of  QV  The  sesquilinear  form  is  defined 

by 

bj(u,  v)=  Vh  •  V/vdx  —  ik  I  uvds. 

JQj  JdncdQj 

We  note  that  for  an  interior  subregion  we  cannot  guarantee  solvability  of  the 
subproblem  except  by  making  the  diameters  of  the  components  of  the  subregion 
Qj  small  enough.  The  same  preconditioner  can  also  be  obtained  by  a  matrix 
splitting  based  on  the  diagonal  blocks  of  variables  associated  with  the  nodes  in 
Q  j  and  on  dQ  fl  dQ  j . 

ALG2  The  sesquilinear  form  is  chosen  as 

h;(«,  v)=  (  Y«  •  Vvdx  -  ik  f  uvds, 

J  £lj  J  d^lj 

and  the  elements  of  the  space  Vj  are  now  required  to  vanish  at  all  the  nodes 
in  the  open  set  QCj.  We  require  that  the  solution  of  equation  (5)  belong  to  the 
same  space.  Continuity  of  the  iterates  is  maintained  by  overwriting  all  the  old 
values  at  all  the  nodes  of  the  closure  of  Qj. 

ALG3  The  same  Vj  and  bj(-,  ■)  are  used  as  in  ALG2,  but  both  the  old  and  the  new 
values  on  dQj  are  saved.  This  will  typically  produce  a  jump  across  this  part  of 
the  interface.  At  the  same  time  the  jump  across  the  interface  interior  to  Q  j  is 
eliminated.  Further  details  will  be  given;  we  note  that  the  new  features  of  this 
algorithm  have  required  a  redesign  of  our  data  structures,  and  that  there  are 
consequences  of  the  jumps  that  need  careful  scrutiny  in  order  to  understand 
ALG3  correctly. 

Since  we  use  completely  standard  techniques  for  the  coarse  grid  correction,  we 
describe  only  the  fine  grid  fractional  steps  of  ALG3  in  some  detail.  We  must  first 
realize  that  the  lack  of  continuity  across  the  interface  F  forces  us  to  use  broken  norms, 
i.e.  to  replace  integrals  over  Q  and  the  Qj  by  sums  of  integrals  over  atomic  subregions 
defined  by  T.  We  proceed  by  finding  the  common  refinement  of  all  splittings  like  (4). 
Let  {Ag|g  =  1, . .  .  ,Q)  be  the  open  atoms  generated  by  { T2 7 } ,  i.e.  the  collection  of  the 
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largest  open  sets  satisfying  Aq  C  Qj  or  Aq  C  Qc-  for  all  j  and  q.  We  refine  (4)  by 
expressing  each  term  as  a  sum  of  Helmholtz  forms  defined  on  the  collection  of  open 
atoms  contained  in  that  region.  Thus, 

M<vu) 

AqCSij 

bjiuiv)  =  M^)- 

These  are  the  splittings  needed  to  solve  equation  (5)  in  the  presence  of  jumps  and 
to  represent  the  solution  in  atomic  form  for  further  steps.  The  sesquilinear  forms 
corresponding  to  the  individual  atoms  are  defined  by 

bn(u,v)  =  a\(u,v)  —  ik(u,v)z  —  ik(u,  v)p_  +  ik(u,  t>)p+ 

i  <2  1  q  1  q 

where, 


Eg  =  0Aq 

j.  n  dtt 

=  u| 

\dAq  n  OQ . 

{ 

fip  A„j  =  l,...,jJ 

=  u| 

\  dAq  n  dQ  j 

l 

Qj  D  Aq ,  j  =  1  ,...,■/ 1 

For  a  valid  splitting,  we  have  to  assume  that  the  boundaries  of  any  two  intersecting 
subdomains,  iit  and  Qj,  must  have  the  same  unit  normal  where  they  intersect,  except 
on  sets  of  measure  zero. 

The  principal  difficulty  in  implementing  a  multiplicative  Schwarz  cycle  based  on  (5) 
is  that  it  requires  that  multiple  values  be  kept  at  the  atom  interfaces  T,,  and  r+  because 
continuity  is  not  enforced.  Therefore,  we  represent  the  solution  function  u  as  an  element 
in  the  direct  product  of  finite  element  spaces,  one  for  each  atom.  At  iteration  n  +  j/J 
of  the  algorithm,  see  (5),  the  right  hand  side  is  computed  atomic  subregion  by  atomic 
subregion.  It  is  therefore  practical  to  store  the  nodal  values  of  each  atom  separately. 
We  also  note  that  the  test  functions  v  are  continuous  functions  in  the  closure  of  and 
that  the  solution  un+^J  of  (5)  is  continuous  in  the  closure  of  Qj.  Once  it  is  found,  it  is 
scattered  to  the  individual  atoms  of  Q  j.  The  set  of  nodal  values  of  the  iterate  is  exactly 
what  is  required  in  the  computation  of  the  contribution  from  that  atom  to  the  next  set 
of  right  hand  sides. 

After  a  full  sweep  through  all  the  subregions,  the  residuals  interior  to  each  atomic 
subregion  are  zero,  and  across  any  segment  of  T,  the  solution  is  either  continuous  or 
satisfies  the  flux  condition.  Then,  by  Green’s  formula,  the  approximate  solution  u" 
satisfies 

b(un,  u)  =  F( v)  +  ik  j  [un\vds  Vu  €  V. 

In  the  limit,  the  jump  goes  to  zero;  this  conveniently  signals  convergence.  At  this 
point  of  the  iteration,  the  residuals  can  be  computed  from  the  jump  directly,  but  also 
conventionally  through  its  contributions  from  the  atomic  subregions. 

4.  Theoretical  Results.  Optimal  convergence  has  been  established  for  ALG1 
and  ALG2,  with  a  coarse  space  VH  and  GMRES  acceleration,  using  essentially  only 
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our  older  theory;  see  [2,  3]  or  [11,  Chapter  5.4].  Our  result  is  also  valid  for  ALG3  in 
the  case  when  there  are  no  cross  points.  Our  current  proofs  require  H2  —  regularity 
and  that  k3H 2  is  sufficiently  small,  i.e.  the  phase  error  of  the  coarse  space  solution  is 
small  enough.  (We  believe  that  the  H2  —  regularity  can  be  weakened  at  the  expense  of 
a  more  severe  restriction  on  k  and  H .)  We  note  that  while  these  types  of  conditions 
are  meaningful  asymptotically,  since  our  results  show  that  the  number  of  iterations  will 
be  independent  of  h,  only  experiments  can  tell  if  the  restriction  imposed  on  H  makes 
our  results  irrelevant  for  a  choice  of  mesh  points  that  corresponds  to  a  realistic  number 
of  mesh  points  per  wave  length.  We  also  note  that  our  current  theory  fails  to  explain 
the  quite  satisfactory  performance  that  we  have  observed  in  many  of  our  experiments 
even  without  a  coarse  correction. 

The  bound  for  ALG1  is  independent  of  k  and  h  while  that  of  ALG2  deteriorates 
linearly  with  the  number  of  points  per  wave  length. 

In  view  of  our  results  and  the  formulas  for  the  phase  error,  we  have  made  series  of 
experiments  with  a  fixed  k3H 2  as  well  as  k3h2 . 

5.  Numerical  Results.  The  software  used  was  developed  with  the  numerical 
linear  algebra  library  PETSc  [1]  supplied  by  Argonne  National  laboratories  and  Matlab 
(TM),  a  product  of  Math  works,  Inc.  We  would  like  to  acknowledge  the  generous  help 
of  the  PETSc  implementors  in  developing  and  debugging  our  code.  The  platforms  for 
our  computations  are  a  Silicon  Graphics  Reality  Monster  (TM)  parallel  computer  at 
Argonne  National  Laboratories  and  local  workstations. 

We  now  describe  the  geometry  and  discretization  used  in  our  numerical  experi¬ 
ments.  In  all  cases  0  is  a  unit  square  discretized  with  Q\  elements,  and  the  subregions 
Q  j  are  built  from  a  decomposition  of  Q  into  nonoverlapping  square  subregions  with  a 
layer  of  S  elements  added  in  all  directions.  The  relevant  parameters  for  describing  an 
experiment  are: 

•  n  the  number  grid  points  per  side  of  the  square  fine  mesh,  nc  the  analog  for  the 
coarse  mesh;  h  =  l/(n  —  1)  with  Q  a  unit  square; 

•  k  the  spatial  frequency; 

•  the  number  of  subregions; 

•  5  the  number  of  elements  across  half  of  the  overlap; 

•  ppw  the  number  of  points  per  wave  length;  ppw  = 

The  number  of  points  per  wavelength  is  a  non-dimensional  measure  of  resolution. 

We  first  compare  the  performance  of  the  three  algorithms  on  a  2  X  2  set  of  square 
subregions  without  using  a  coarse  grid.  Here  and  below,  we  say  that  an  algorithm 
has  converged  at  a  given  iteration  if  the  £2  norm  of  the  residual  is  less  than  10-6  of 
that  of  the  original  residual.  We  also  discuss  the  relative  error  at  termination  of  the 
iteration  measured  by  the  £2  norm  of  the  difference  between  the  final  iterate  and  the 
exact  solution  of  the  discrete  system  divided  by  the  £2  norm  of  the  same  exact  solution. 
An  iteration  number  given  in  parentheses  indicates  divergence  and  the  iteration  number 
at  which  the  iteration  was  stopped. 

For  ALG1  GMRES  acceleration  had  to  be  used  at  all  times  to  obtain  convergence; 
moreover,  given  n  =  97  or  129,  <5  =  1,2,  3,  4,  or  5,  and  ppw  =  10  or  20,  ALG1  never 
converges  in  fewer  than  40  iterations.  Moreover,  the  relative  error  in  the  solution  always 
exceeds  10-4,  possibly  reflecting  ill-conditioning. 

For  ALG2,  we  obtain  convergence  provided  we  use  either  overlap  or  GMRES  ac¬ 
celeration.  Given  the  resolution  n  =  97  or  129  and  ppw  =  10,  ALG2  diverges  when 
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Table  1 
Table  for  ALG3 


s 

3 

2 

1 

nc=0; 

(101) 

3.87e-01 

31 

9.90e-07 

20 

6.91e-07 

k=  20.1 

15 

8.97e-07 

16 

6.84e-07 

17 

5.75e-07 

nc=9; 

54 

9.47e-07 

(101) 

1.15e+01 

(34) 

5.85e+01 

k=  20.1 

14 

5.99e-07 

14 

6.50e-07 

15 

7.05e-07 

nc=17; 

45 

9.58e-07 

20 

9.28e-07 

19 

9.53e-07 

k=  20.1 

13 

9.85e-07 

12 

7.44e-07 

11 

4.44e-07 

nc=33; 

42 

9.44e-07 

21 

7.33e-07 

17 

7.04e-07 

k=  20.1 

13 

4.17e-07 

12 

9.41e-07 

11 

4.28e-07 

S 

3 

2 

1 

nc=0; 

18 

9.29e-07 

21 

7.32e-07 

24 

7.15e-07 

k=  31.9 

15 

6.19e-07 

16 

6.38e-07 

18 

7.98e-07 

nc=17; 

19 

8.09e-07 

(101) 

5.49e-03 

(30) 

2.59e+01 

k=  31.9 

13 

6.60e-07 

14 

9.70e-07 

17 

5.72e-07 

nc=33; 

36 

9.12e-07 

17 

7.98e-07 

22 

8.42e-07 

k=  31.9 

13 

5.41e-07 

13 

4.86e-07 

13 

7.81e-07 

nc=65; 

18 

9.09e-07 

17 

6.87e-07 

23 

8.72e-07 

k=  31.9 

12 

6.20e-07 

12 

6.58e-07 

13 

9.02e-07 

S  =  0  unless  GMRES  acceleration  is  applied.  Even  with  acceleration  but  without  over¬ 
lap,  ALG2  appears  to  be  worse  than  ALG1.  With  the  smallest  overlap,  (5=1,  ALG2 
converges  without  acceleration.  The  convergence  rate,  about  42  iterations,  is  similar 
to  that  of  ALG1,  but  the  error  is  one  tenth  of  that  encountered  in  ALG1,  apparently 
reflecting  better  conditioning.  With  acceleration  and  <5=1,  ALG2  converges  in  13 
iterations  for  both  resolutions. 

ALG3  does  not  converge  at  all  without  overlap,  but  with  overlap  it  outperforms 
both  ALG1  and  ALG2.  Given  ppw  =  10  the  resolution  n  =  97  or  129  and  5  =  1,  2,  3 
or  4,  ALG3  consistently  converges  at  least  twice  as  fast  as  ALG2.  For  these  parameter 
values  ALG3  always  converges  in  fewer  than  10  iterations,  and  the  relative  error  is 
always  less  than  10-6.  Given  ppw  =  20,  whether  n  =  97  or  n  =  129,  the  results  are  the 
same  provided  that  the  ratio  of  5  to  ppw  is  maintained;  this  indicates  that  ALG3  has 
converged  with  respect  to  resolution.  However,  when  a  coarse  grid  is  used,  the  ratio  of 
5  to  ppw  ceases  to  be  accurate  determinant  of  performance. 

In  the  remaining  numerical  results  we  investigate  the  behavior  of  ALG3  in  the  case 
of  many  subregions.  Generally  speaking,  in  the  many  subregion  case  ALG3  needs  either 
GMRES  acceleration  or  a  sufficiently  fine  coarse  grid  to  converge.  Table  1  shows  the 
results  of  several  runs  with  ALG3.  For  the  two  sub-tables,  ppw  is  20.00  (above)  and 
25.20  (below),  and  the  mesh  size  is  65  X  65  (above)  and  129  X  129  (below).  The  two 
choices  of  parameters  are  related  by  a  constant  value  of  k3h2 .  In  all  cases  there  are  8x8 
subregions,  and  coarse  mesh  size  varies  as  indicated  in  the  left  column.  Within  each 
cell  are  presented  data  for  the  unaccelerated  algorithm  (upper  row),  the  accelerated 
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algorithm  (lower  row) ,  the  iteration  count  (left  column) ,  the  normalized  residual  (right 
column). 

In  all  cases  without  a  coarse  grid,  GMRES  forces  convergence  in  18  iterations  or 
less.  By  contrast,  the  algorithm  sometimes  fails  to  converge  when  neither  a  coarse  grid 
nor  GMRES  is  used.  Indeed,  other  experimental  results  suggest  that  with  a  larger 
number  of  subregions  (>  1,000)  convergence  without  a  coarse  grid  always  requires 
GMRES  acceleration. 

In  particular  for  a  run  not  shown  in  our  tables  with  ppw=40,  a  257  X  257  fine  grid, 
no  coarse  grid,  5  =  2,  and  a  32  X  32  array  of  overlapping  subregions,  the  accelerated 
algorithm  converges  in  74  steps  while  the  unaccelerated  algorithm  diverges.  A  coarse 
grid  correction  with  a  65  X  65  grid  causes  convergence  in  17  steps  for  the  accelerated 
algorithm  and  86  steps  for  the  unaccelerated  algorithm. 

Our  highest  resolution  computations,  so  far,  uses  a  385  X  385  fine  grid  with  a 
24  X  24  array  of  subregions,  5  =  2,  and  ppw  =  53.33.  Even  without  a  coarse  grid, 
the  unaccelerated  algorithm  converges  in  46  iterations;  with  GMRES  acceleration  the 
algorithm  converges  in  42  iterations.  With  a  97  X  97  coarse  grid,  the  unaccelerated 
algorithm  requires  22  iterations,  but  the  accelerated  algorithm  only  9  iterations. 

For  some  problems  that  we  consider  the  minimum  number  of  coarse  grid  nodes  per 
wavelength  required  for  the  coarse  grid  correction  to  be  helpful  has  been  as  small  as 
three,  but  this  number  appears  to  grow  somewhat  with  larger  values  of  k  and  larger 
values  of  subregions. 
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